Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report FQDN on host.name as opt-in, enabled by feature flag #2218

Merged
merged 101 commits into from
Apr 4, 2023

Conversation

AndersonQ
Copy link
Member

@AndersonQ AndersonQ commented Feb 1, 2023

What does this PR do?

This PR enhances Elastic Agent to understand feature flags configuration, supplied either via standalone policy or Fleet policy, and applies said configuration to itself as well as to its components.

Why is it important?

To implement the first feature flag, FQDN. Enabling the FQDN feature flag will allow Agent and its components to report their host's fully-qualified domain name via the host.name field in events generated by the components.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool

Author's Checklist

Manual testing of Agent in Standalone mode

Without FQDN feature flag enabled

agent_standalone_fqdn_disabled.mov

With FQDN feature flag enabled

agent_standalone_fqdn_enabled.mov

How to test this PR locally

The screen recordings in the section above were created by following the same steps as listed below.

  1. Build Agent package.
  2. Spin up a VM, setting hostname to a FQDN, e.g. foo.bar.baz.
  3. Copy Agent package into VM.
  4. Setup an ES + (optionally) Kibana deployment.
  5. Setup two standalone Agent configurations, one with the FQDN feature flag not enabled and one with it enabled. In both, point the output to the ES cluster in the deployment you created.
  6. Start the Agent with the first configuration (FQDN feature flag not enabled).
  7. After a few seconds, check ES (or Kibana Discover) for data. Ensure that host.name and host.hostname are both set to the short hostname, e.g. foo in all documents.
  8. Stop Agent.
  9. Start the Agent with the second configuration (FQDN feature flag enabled).
  10. After a few seconds, check ES (or Kibana Discover) for data. Ensure that host.name is set to the FQDN, e.g. foo.bar.baz while host.hostname is set to the short hostname, e.g. foo, in all documents.
  11. Stop Agent.

Related issues

Use cases

Screenshots

Logs

@mergify
Copy link
Contributor

mergify bot commented Feb 1, 2023

This pull request does not have a backport label. Could you fix it @AndersonQ? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@elasticmachine
Copy link
Contributor

elasticmachine commented Feb 1, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-04-04T14:00:06.512+0000

  • Duration: 19 min 11 sec

Test stats 🧪

Test Results
Failed 0
Passed 5387
Skipped 23
Total 5410

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@AndersonQ AndersonQ force-pushed the 2185-fqdn-feature-flag branch 2 times, most recently from 2d9504f to 1312b91 Compare February 6, 2023 15:37
@mergify
Copy link
Contributor

mergify bot commented Feb 13, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 2185-fqdn-feature-flag upstream/2185-fqdn-feature-flag
git merge upstream/main
git push upstream 2185-fqdn-feature-flag

1 similar comment
@mergify
Copy link
Contributor

mergify bot commented Feb 27, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 2185-fqdn-feature-flag upstream/2185-fqdn-feature-flag
git merge upstream/main
git push upstream 2185-fqdn-feature-flag

@AndersonQ AndersonQ force-pushed the 2185-fqdn-feature-flag branch from b8eb1a0 to 8dc9168 Compare February 27, 2023 20:15
@elasticmachine
Copy link
Contributor

elasticmachine commented Feb 27, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.462% (64/65) 👍 0.024
Files 70.0% (154/220) 👍 0.137
Classes 68.675% (285/415) 👍 0.152
Methods 54.069% (877/1622) 👍 0.075
Lines 39.331% (9790/24891) 👍 0.048
Conditionals 100.0% (0/0) 💚

@mergify
Copy link
Contributor

mergify bot commented Mar 9, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 2185-fqdn-feature-flag upstream/2185-fqdn-feature-flag
git merge upstream/main
git push upstream 2185-fqdn-feature-flag

@ycombinator
Copy link
Contributor

I'm starting to look at this PR now. For starters, I'm just going to clean it up a bit like so:

  • there appear to be several formatting changes in this PR that are unrelated to it's primary purpose. Just to make the changes easier to parse and review, I'm going to extract these formatting changes into their own separate PR.
  • at this point I've built Agent using this PR and its sister PRs relating to the FQDN feature, and manually tested that Agent with Fleet, end-to-end, via as many as 11 policy changes. Agent has behaved as expected in each test case. So I'm going to remove some of the logging that was introduced / up-leveled in this PR, presumably for debugging purposes.

@ycombinator ycombinator mentioned this pull request Mar 9, 2023
6 tasks
@ycombinator
Copy link
Contributor

there appear to be several formatting changes in this PR that are unrelated to it's primary purpose. Just to make the changes easier to parse and review, I'm going to extract these formatting changes into their own separate PR.

Cleanup PR is up: #2361. Once it's merged, I'll rebase this PR here on main so it only contains changes relevant to the FQDN feature flag, hopefully making it easier to review.

@ycombinator
Copy link
Contributor

ycombinator commented Mar 20, 2023

Even after #2361, there are still some changes left in this PR here (accounting for nearly 50% LOC) that aren't directly related to implementing the FQDN feature. They are related to refactoring the fake component and shipper.

To make this PR here even easier to review, I've extracted these refactoring changes into it's own smaller PR: #2378. Once that PR is reviewed and merged, I'll rebase this PR here on main so it contains only FQDN-related changes in it, hopefully making it easier to review.

@mergify
Copy link
Contributor

mergify bot commented Mar 20, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 2185-fqdn-feature-flag upstream/2185-fqdn-feature-flag
git merge upstream/main
git push upstream 2185-fqdn-feature-flag

@ycombinator ycombinator force-pushed the 2185-fqdn-feature-flag branch 3 times, most recently from fa47fec to a39e14e Compare March 20, 2023 20:47
@ycombinator ycombinator added enhancement New feature or request backport-v8.7.0 Automated backport with mergify and removed backport-skip labels Mar 21, 2023
@ycombinator ycombinator force-pushed the 2185-fqdn-feature-flag branch from 7f2d220 to ef24d66 Compare April 3, 2023 16:05
@jlind23
Copy link
Contributor

jlind23 commented Apr 4, 2023

@ycombinator Looks like we are good to know with this PR now right? 😄

@ycombinator
Copy link
Contributor

@ycombinator Looks like we are good to know with this PR now right? 😄

No, still one issue remaining: #2218 (comment). I'm still trying to figure this out myself but have also pinged @cmacknz and @blakerouse (in Slack) for help.

@@ -199,6 +201,16 @@ func (i *AgentInfo) ECSMetadataFlatMap() (map[string]interface{}, error) {
}

info := sysInfo.Info()
hostname := info.Hostname
l.Debugf("in ECSMetadataFlatMap, hostname = %s", hostname)
Copy link
Contributor

@ycombinator ycombinator Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only added this line for debugging purposes, so we could see debug-level logs from this method regardless of whether the FQDN feature flag was set or not. Will remove before this PR is merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to remove this line in my excitement to merge this PR, so here's the follow up PR to remove it: #2451

sysInfo, err := sysinfo.Host()
if err != nil {
return nil, err
}

info := sysInfo.Info()
hostname := info.Hostname
l.Debugf("in ECSMetadata, hostname = %s", hostname)
Copy link
Contributor

@ycombinator ycombinator Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only added this line for debugging purposes, so we could see debug-level logs from this method regardless of whether the FQDN feature flag was set or not. Will remove before this PR is merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to remove this line in my excitement to merge this PR, so here's the follow up PR to remove it: #2451

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the remaining issue with the log level was not introduced in this PR. Shaunak is going to file an issue to track the problem.

With that out of the way this looks good to me. Thanks for all the effort here!

@ycombinator ycombinator merged commit be818cc into elastic:main Apr 4, 2023
mergify bot pushed a commit that referenced this pull request Apr 4, 2023
* wip

* wip

* wip

* clean up

* wip

* .

* agent.name as fqdn

* it works

* wip

* WIP - using fake input, created test, it seems to work

* finish test for feature flags

* remove debug log

* some clean up

* mage check

* fixing licenses

* it works

* it works

* remove debug logs and comments

* make notice

* fake component and shipper are built by TestMain

* fixing tests

* make notice

* adjust tests

* update elatic-agent-client

* add .exe for win binaries

* adjust fake components build

* increase test timeout

* fix tests

* adjust features pkg and don't send featuresFlag if nil

* it works

* make notice

* revert some changes

* handle error properly

* add TODO

* Set FQDN for Vagrant elastic-agent VM

Useful for testing FQDN feature; harmless otherwise

* Undoing trivial formatting changes

* Running mage fmt

* Remove personal repo from NOTICE overrides

* Sorting imports

* Fix error handling

* Fixing access

* Fixing packaging of fake component

* Update logging

* Refactoring fake component

* Allowing fmt.Print* in TestMain

* Remove redundant return

* Fixing typo in error message

* Undo unintentional changes

* Removing redundant TestMain file

* Add comment on FQDN to Vagrantfile

* Remove debug logging statements

* Making test pass

* Document feature flags configuration in reference config file

* Adding CHANGELOG entry

* Use convenience method

* Formatting

* Use convenience method from elastic-agent-libs

* Running mage update

* Fixing imports

* Reducing whitespace changes

* Reducing whitespace changes

* Reducing whitespace changes

* Reducing whitespace changes

* Restore timeout duration

* Move mutex inside struct

* Prevent mutex copying

* Fixing state diagnostics test

* Fixing components diagnostics test

* Adding features.source to components golden file

* More interesting diagnostics state test

* Log error if FQDN lookup fails and fallback to OS-reported hostname

* Bump up go-sysinfo dependency version

* Fix typos

* Fixing compile errors

* Fix tests

* Fix imports

* Fix issues lost in rebase

* Fixing more imports

* Updating NOTICE.txt

* Clarify features indentation

* Remove replace directive

* Bumping up version on elastic-agent-client dependency

* Adding test mocking DNS

* Initialize featuresIdx in same place as units' configIdx

* Updating NOTICE.txt

* Remove unnecessary nil check

* Set features in state initially

* Update features in state from checkin observed message

* Remove TODO

* Updating elatic-agent.yml test fixture to enable FQDN feature flag

* Try flipping the authoritative flag

* Remove unreliable DNS mocking test

* Set source

* Update test fixture

* Change logger

* Fixing test case

* Omit feature flags serialization from state.yaml diag file

* Updating NOTICE.txt

* Add descriptive comment on why Features is not being serialized as YAML

* Updating test fixture

* Inject logger into metadata methods

---------

Co-authored-by: Shaunak Kashyap <[email protected]>
(cherry picked from commit be818cc)

# Conflicts:
#	internal/pkg/agent/application/application.go
#	internal/pkg/agent/application/coordinator/coordinator.go
#	internal/pkg/agent/application/coordinator/testdata/simple_config/expected/state.yaml
ycombinator pushed a commit that referenced this pull request Apr 5, 2023
* wip

* wip

* wip

* clean up

* wip

* .

* agent.name as fqdn

* it works

* wip

* WIP - using fake input, created test, it seems to work

* finish test for feature flags

* remove debug log

* some clean up

* mage check

* fixing licenses

* it works

* it works

* remove debug logs and comments

* make notice

* fake component and shipper are built by TestMain

* fixing tests

* make notice

* adjust tests

* update elatic-agent-client

* add .exe for win binaries

* adjust fake components build

* increase test timeout

* fix tests

* adjust features pkg and don't send featuresFlag if nil

* it works

* make notice

* revert some changes

* handle error properly

* add TODO

* Set FQDN for Vagrant elastic-agent VM

Useful for testing FQDN feature; harmless otherwise

* Undoing trivial formatting changes

* Running mage fmt

* Remove personal repo from NOTICE overrides

* Sorting imports

* Fix error handling

* Fixing access

* Fixing packaging of fake component

* Update logging

* Refactoring fake component

* Allowing fmt.Print* in TestMain

* Remove redundant return

* Fixing typo in error message

* Undo unintentional changes

* Removing redundant TestMain file

* Add comment on FQDN to Vagrantfile

* Remove debug logging statements

* Making test pass

* Document feature flags configuration in reference config file

* Adding CHANGELOG entry

* Use convenience method

* Formatting

* Use convenience method from elastic-agent-libs

* Running mage update

* Fixing imports

* Reducing whitespace changes

* Reducing whitespace changes

* Reducing whitespace changes

* Reducing whitespace changes

* Restore timeout duration

* Move mutex inside struct

* Prevent mutex copying

* Fixing state diagnostics test

* Fixing components diagnostics test

* Adding features.source to components golden file

* More interesting diagnostics state test

* Log error if FQDN lookup fails and fallback to OS-reported hostname

* Bump up go-sysinfo dependency version

* Fix typos

* Fixing compile errors

* Fix tests

* Fix imports

* Fix issues lost in rebase

* Fixing more imports

* Updating NOTICE.txt

* Clarify features indentation

* Remove replace directive

* Bumping up version on elastic-agent-client dependency

* Adding test mocking DNS

* Initialize featuresIdx in same place as units' configIdx

* Updating NOTICE.txt

* Remove unnecessary nil check

* Set features in state initially

* Update features in state from checkin observed message

* Remove TODO

* Updating elatic-agent.yml test fixture to enable FQDN feature flag

* Try flipping the authoritative flag

* Remove unreliable DNS mocking test

* Set source

* Update test fixture

* Change logger

* Fixing test case

* Omit feature flags serialization from state.yaml diag file

* Updating NOTICE.txt

* Add descriptive comment on why Features is not being serialized as YAML

* Updating test fixture

* Inject logger into metadata methods

---------

Co-authored-by: Shaunak Kashyap <[email protected]>
(cherry picked from commit be818cc)
ycombinator pushed a commit that referenced this pull request Apr 6, 2023
* wip

* wip

* wip

* clean up

* wip

* .

* agent.name as fqdn

* it works

* wip

* WIP - using fake input, created test, it seems to work

* finish test for feature flags

* remove debug log

* some clean up

* mage check

* fixing licenses

* it works

* it works

* remove debug logs and comments

* make notice

* fake component and shipper are built by TestMain

* fixing tests

* make notice

* adjust tests

* update elatic-agent-client

* add .exe for win binaries

* adjust fake components build

* increase test timeout

* fix tests

* adjust features pkg and don't send featuresFlag if nil

* it works

* make notice

* revert some changes

* handle error properly

* add TODO

* Set FQDN for Vagrant elastic-agent VM

Useful for testing FQDN feature; harmless otherwise

* Undoing trivial formatting changes

* Running mage fmt

* Remove personal repo from NOTICE overrides

* Sorting imports

* Fix error handling

* Fixing access

* Fixing packaging of fake component

* Update logging

* Refactoring fake component

* Allowing fmt.Print* in TestMain

* Remove redundant return

* Fixing typo in error message

* Undo unintentional changes

* Removing redundant TestMain file

* Add comment on FQDN to Vagrantfile

* Remove debug logging statements

* Making test pass

* Document feature flags configuration in reference config file

* Adding CHANGELOG entry

* Use convenience method

* Formatting

* Use convenience method from elastic-agent-libs

* Running mage update

* Fixing imports

* Reducing whitespace changes

* Reducing whitespace changes

* Reducing whitespace changes

* Reducing whitespace changes

* Restore timeout duration

* Move mutex inside struct

* Prevent mutex copying

* Fixing state diagnostics test

* Fixing components diagnostics test

* Adding features.source to components golden file

* More interesting diagnostics state test

* Log error if FQDN lookup fails and fallback to OS-reported hostname

* Bump up go-sysinfo dependency version

* Fix typos

* Fixing compile errors

* Fix tests

* Fix imports

* Fix issues lost in rebase

* Fixing more imports

* Updating NOTICE.txt

* Clarify features indentation

* Remove replace directive

* Bumping up version on elastic-agent-client dependency

* Adding test mocking DNS

* Initialize featuresIdx in same place as units' configIdx

* Updating NOTICE.txt

* Remove unnecessary nil check

* Set features in state initially

* Update features in state from checkin observed message

* Remove TODO

* Updating elatic-agent.yml test fixture to enable FQDN feature flag

* Try flipping the authoritative flag

* Remove unreliable DNS mocking test

* Set source

* Update test fixture

* Change logger

* Fixing test case

* Omit feature flags serialization from state.yaml diag file

* Updating NOTICE.txt

* Add descriptive comment on why Features is not being serialized as YAML

* Updating test fixture

* Inject logger into metadata methods

---------

Co-authored-by: Shaunak Kashyap <[email protected]>
(cherry picked from commit be818cc)
ycombinator added a commit that referenced this pull request Apr 6, 2023
…feature flag (#2449)

* Report FQDN on host.name as opt-in, enabled by feature flag (#2218)

* wip

* wip

* wip

* clean up

* wip

* .

* agent.name as fqdn

* it works

* wip

* WIP - using fake input, created test, it seems to work

* finish test for feature flags

* remove debug log

* some clean up

* mage check

* fixing licenses

* it works

* it works

* remove debug logs and comments

* make notice

* fake component and shipper are built by TestMain

* fixing tests

* make notice

* adjust tests

* update elatic-agent-client

* add .exe for win binaries

* adjust fake components build

* increase test timeout

* fix tests

* adjust features pkg and don't send featuresFlag if nil

* it works

* make notice

* revert some changes

* handle error properly

* add TODO

* Set FQDN for Vagrant elastic-agent VM

Useful for testing FQDN feature; harmless otherwise

* Undoing trivial formatting changes

* Running mage fmt

* Remove personal repo from NOTICE overrides

* Sorting imports

* Fix error handling

* Fixing access

* Fixing packaging of fake component

* Update logging

* Refactoring fake component

* Allowing fmt.Print* in TestMain

* Remove redundant return

* Fixing typo in error message

* Undo unintentional changes

* Removing redundant TestMain file

* Add comment on FQDN to Vagrantfile

* Remove debug logging statements

* Making test pass

* Document feature flags configuration in reference config file

* Adding CHANGELOG entry

* Use convenience method

* Formatting

* Use convenience method from elastic-agent-libs

* Running mage update

* Fixing imports

* Reducing whitespace changes

* Reducing whitespace changes

* Reducing whitespace changes

* Reducing whitespace changes

* Restore timeout duration

* Move mutex inside struct

* Prevent mutex copying

* Fixing state diagnostics test

* Fixing components diagnostics test

* Adding features.source to components golden file

* More interesting diagnostics state test

* Log error if FQDN lookup fails and fallback to OS-reported hostname

* Bump up go-sysinfo dependency version

* Fix typos

* Fixing compile errors

* Fix tests

* Fix imports

* Fix issues lost in rebase

* Fixing more imports

* Updating NOTICE.txt

* Clarify features indentation

* Remove replace directive

* Bumping up version on elastic-agent-client dependency

* Adding test mocking DNS

* Initialize featuresIdx in same place as units' configIdx

* Updating NOTICE.txt

* Remove unnecessary nil check

* Set features in state initially

* Update features in state from checkin observed message

* Remove TODO

* Updating elatic-agent.yml test fixture to enable FQDN feature flag

* Try flipping the authoritative flag

* Remove unreliable DNS mocking test

* Set source

* Update test fixture

* Change logger

* Fixing test case

* Omit feature flags serialization from state.yaml diag file

* Updating NOTICE.txt

* Add descriptive comment on why Features is not being serialized as YAML

* Updating test fixture

* Inject logger into metadata methods

---------

Co-authored-by: Shaunak Kashyap <[email protected]>
(cherry picked from commit be818cc)

* Fixing return args

---------

Co-authored-by: Anderson Queiroz <[email protected]>
Co-authored-by: Shaunak Kashyap <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.7.0 Automated backport with mergify enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Report FQDN on host.name enabled by a feature flag
8 participants